Exploring Expressive Speech Space in an Audio-book
نویسندگان
چکیده
In this paper, an audio-book, in which a professional voice talent performs multiple characters, is exploited to investigate the expressiveness of speech. The expressive speech space of the sole speaker is explored by finding the distances between acoustic models of multiple characters and the perceived proximity between their speech utterances. Using the speech of ten characters as test data, the character confusion is evaluated in both acoustic space and perceptual space. We find that the average precision to differentiate one character from the others is 81.7% in the acoustic space and 72.6% in the perceptual space. It is interesting that the objective measure outperforms the subjective measure. Furthermore, the acoustic distance measured by normalized Kullback-Leibler divergence (NKLD) between two characters is highly correlated with the perceptual distance. The correlation coefficient is 0.814. Therefore, NKLD can measure the perceptual similarity between groups of utterances objectively.
منابع مشابه
Exploring Rich Expressive Information from Audiobook Data Using Cluster Adaptive Training
Audiobook data is a freely available source of rich expressive speech data. To accurately generate speech of this form, expressiveness must be incorporated into the synthesis system. This paper investigates two parts of this process: the representation of expressive information in a statistical parametric speech synthesis system; and whether discrete expressive state labels can sufficiently rep...
متن کاملExploring EFL Learners’ Use of Formulaic Sequences in Pragmatically Focused Role-play Tasks
Communicative language use largely entails regular patterns consisting of pre-constructed phrases or sequences. These sequences have been examined by many researchers to find the situation-based formulas which may help L2 learners follow a possibly more target-like speaking system. This study, therefore, explored two categories of formulaic expressions including speech formulas and situation-bo...
متن کاملAcoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted Speech
Within the framework of developing an expressive audiovisual speech synthesis, an acoustic and visual analysis of expressive acted speech is proposed in this paper. Our purpose is to identify the main characteristics of audiovisual expressions that need to be integrated during synthesis to provide believable emotions to the virtual 3D talking head. We conducted a case study of a semi-profession...
متن کاملAcoustic quality assessment at Nezamol molk dome of Jame mosque of Isfahan
Incontrovertibly, the sense of hearing is one of the five most substantial human senses. In fact, the human ear receives sound and transmits to the human brain by the auditory organs. Hence, sound can be considered as one of the key tools of human communication with each other and the environment around them. Since acoustic has a profound impact on the body, soul, and the performance of human ...
متن کاملThe ILSP Text - to - Speech System for the Blizzard Challenge 2012
This paper describes ILSP and INNOETICS Speech Synthesis System entry for the Blizzard Challenge 2012. A description of the underlying system and techniques used are provided, as well as information about the voice building process and discussion on the obtained evaluation results. Additional focus will be given to new processes or techniques we used this year in comparison to our previous part...
متن کامل